Evaluation of spatial Bayesian Empirical Likelihood models in analysis of small area data

Bayesian empirical likelihood (BEL) models are becoming increasingly popular as an attractive alternative to fully parametric models. However, they have only recently been applied to spatial data analysis for small area estimation. This study considers the development of spatial BEL models using two popular conditional autoregressive (CAR) priors, namely BYM and Leroux priors. The performance of the proposed models is compared with their parametric counterparts and with existing spatial BEL models using independent Gaussian priors and generalised Moran basis priors. The models are applied to two benchmark spatial datasets, simulation study and COVID-19 data. The results indicate promising opportunities for these models to capture new insights into spatial data. Specifically, the spatial BEL models outperform the parametric spatial models when the underlying distributional assumptions of data appear to be violated.


Introduction
Bayesian Empirical Likelihood (BEL) was first described by Lazar [1] based on early work on Empirical Likelihood (EL) by Owen. [2]. The concept of EL has been utilised in Bayesian analysis in a few instances since then [3][4][5]. As discussed in section 2.1, BEL provides a flexible semi-parametric approach using data to approximate the likelihood part of the Bayesian posterior combined with parametric prior distributions.
In the context of spatially dependent data, an EL framework has been developed in a frequentist set-up by [6][7][8][9]. BEL semi-parametric approaches for modelling areal spatial data were introduced by Chaudhuri and Ghosh [10] and Porter et al. [11,12] extending small area estimation models [13]. These BEL spatial approaches utilise a Bayesian hierarchical framework, moving the spatial dependence structure to the parameter model of the hierarchy and applying EL at the observation level [11]. Chaudhuri and Ghosh [10] introduced area-level and unit-level models which can handle both discrete and continuous data, with informative priors for the spatial random effects as independent Gaussian and Dirichlet process mixture priors. Porter et al. [11] provided a more complete version of this model and suggested that it could be generalised to spatial-temporal dependencies. Porter et al. [11] formulated a different set of lattice priors utilising the generalised Moran basis [14] for the spatial dependencies. The multivariate extension of the Bayesian semi-parametric hierarchical empirical likelihood (BSHEL) model of [11] can be found in Porter et al. [12]. The BEL models described above focused on a selected set of priors to represent spatial dependence. However, a class of very popular spatial priors, namely the conditional autoregressive (CAR) priors for the spatial random effects in areal data analysis [15,16] have still not been explored in the BEL framework. This article aims to address this research gap by formulating BEL semi-parametric spatial models applying CAR structure priors for spatial random effects.
The CAR priors have gained in popularity for modelling spatial data and have been employed in many applications of spatial modelling, such as disease mapping [17]. Besag et al. [18] introduced the CAR prior, which has become known as the BYM prior. There are now many variants of this prior and corresponding model such as the Cressie model [19], the Leroux model [20] and the Lu model [21]. For a detailed comparison of different CAR priors in spatial analysis under Bayesian parametric framework, see Lee, D. [15] and Rampaso et al. [16]. The Leroux model has been termed as a flexible CAR structure for modelling spatial random effects, since it consists of a spatial dependence parameter (ρ) taking different values according to the underlying spatial autocorrelation present in the data. Special cases of the Leroux model give rise to independent Gaussian (IG) priors for spatial random effects, when no spatial structure is needed for modelling areal data (ρ = 0) and intrinsic conditional autoregressive (ICAR) priors (ρ = 1) (which is the spatial prior considered in the BYM model [18]).
The present study develops spatial BEL (SBEL) models for the two popular CAR prior choices, Leroux and BYM. The proposed SBEL-CAR models are illustrated by analysing areal data on an irregular lattice using two benchmark examples on Scottish lip cancer [22] and North Carolina Sudden Infant Death Syndrome (SIDS) [23] and on a very recent example using a COVID-19 dataset for Europe. The performance of the proposed models and other existing spatial BEL models and their parametric counterparts are compared. The models are also illustrated using simulated datasets.
The development of the proposed models is achieved by extending the Bayesian semiparametric hierarchical empirical likelihood (BSHEL) model proposed by Porter et al. [11]. A recap of the BEL models, BEL spatial models and Bayesian parametric spatial models is given in Section 2. Section 3 contains the formulation of SBEL-CAR models with an algorithm to obtain the posterior samples of interest. The application of the SBEL-CAR models to case study datasets and simulated datasets is reported in section 4 followed by a final discussion in section 5.

Background on BEL and SBEL models
This section briefly discusses the background of Bayesian Empirical Likelihood (BEL) and SBEL models from the existing literature. differentiable statistical functional. These methods were framed as a non-parametric extension of Wilk's [24] theorem for parametric likelihood ratio tests. Later EL was expanded to all types of estimating equations by Qin and Lawless [25].
For data points y 1 , y 2 , . . ., y n from some unknown distribution F, let some functional of F, say θ(F) be the parameter of interest for inference, which can be determined by the estimating equation f(y i , θ), the EL function can be defined as [26]: where w i satisfies the constraints P n i¼1 w i ¼ 1; P n i¼1 w i f i ðy i ; θÞ ¼ 0. Constrained optimisation of the EL ratio function is carried out in order to obtain the EL weights w i which are used as the data likelihood [26].
The introduction to estimating equations to EL enhanced the scope of EL to so many different applications including generalised linear models [27], time series [7,28], econometrics [29], regression analysis [30,31], survival analysis [32] and many more. For more details on scope and benefits of EL, we refer the reader to Lazar [33], which reviewed EL from its initiation to current developments in theory and applications along with the potential for future development.

Bayesian Empirical Likelihood.
In a Bayesian framework, the likelihood is used to update a prior distribution and yield posterior inference. Lazar [1] argued that EL can be used in place of a density and, when multiplied by the prior of the parameter of interest can yield the posterior distribution in such an analysis. The author explored the characteristics of Bayesian inference using EL instead of a parametric density. Starting from a well-known parametric case of a Bayesian posterior of a parameter vector θ, Schennach [3] derived an EL posterior using the weights attributed to the sample points, which can be calculated by solving an entropy maximisation problem, where p(θ) is a given prior on θ and ðw � 1 ðθÞ; . . . ; w � n ðθÞÞ is the solution of: where f i (y i , θ) represents the estimating equations of interest. The estimating equations can be formulated using the moment conditions [3,5]. Schennach [3] showed that for large enough samples, BEL offered similar results to Bayesian bootstrap. Grendar and Judge [34] demonstrated that the BEL is an asymptotic approximation of the Bayesian maximum a posteriori probability estimators, which provided additional justification of using EL in a Bayesian setting [33]. As a result, BEL has been applied in quantile regression [35], ridge and lasso regression [36], inference with complex survey data [37], spatial data analysis [10][11][12]38] and so on. The spatial analysis using BEL is the focus of this study.

Recap of spatial BEL models
BEL has been employed for spatial analysis [10][11][12] and has been found to provide precise estimation of small area effects. Chaudhuri and Ghosh [10] introduced area-level and unit-level models using BEL by extending the traditional Fay-Herriot (FH) model [13] using an EL framework. Following this work, Porter et al. [11] provided a general hierarchical Bayesian framework incorporating empirical likelihood methodology for the data model and developed a spatial FH model used for small area estimation (SAE). The FH model for SAE can be written as: where Y i , i = 1, 2, . . ., n, is a design unbiased estimate of μ i , and � i is an unstructured error component, X i is the vector of covariate information for area i, β is the vector of fixed covariate effects, β = (β 0 , β 1 , . . ., β p ) 0 and ψ i is spatially referenced random effect for area i. Three forms of priors for the vector of spatial random effects ψ were specified by these authors. [10] suggested two options, the first being an independent and identically distributed (iid) Gaussian distribution (IG) with variance A following a Inverse Gamma distribution The second option was a Dirichlet process (DP) with a Gaussian base, where DPða; GÞ represents a Dirichlet process with precision parameter α and a base measure In contrast, Porter et al. [11] suggested the use of a generalised Moran basis [14], where, B is an adjacency matrix for a first order Intrinsic Gaussian Markov Random Field (IGMRF) (rank (B) = n − 1), B + is a diagonal matrix with {B + } i,i = ∑ j2ne(i) b ij , b ij = 1 if i and j are adjacent and 0 otherwise, where j 2 ne(i) means that area j is a neighbour of area i. The vector M is the set of eigen-vectors corresponding to the non-zero eigenvalues of the matrix P c BP 0 c with P c = I − X(X 0 X) −1 X 0 , q is the number of non-zero eigenvalues of the matrix P c BP c and ψ = Mψ � . The prior for the precision parameter τ can be specified as Gamma with hyperparameters chosen to be equal to 1 [11]. The prior for the vector of the fixed covariate effects is specified as where g represents the Zellner prior [39] evaluated at a fixed point estimate 10 [10]. In the prior specification for fixed effects β of [11], A is replaced by τ −1 . For estimation of the fixed effects and the random effects, BEL was used in spite of having a parametric distribution for Y i . The estimating equations for (β, ψ) are: Details of the MCMC sampling algorithm using a random walk Metropolis-Hastings (MH) approach can be found in [11].

Recap of Bayesian parametric spatial models
For parametric modelling of disease incidence or mortality, the response variable is usually assumed to have a Poisson distribution with an expected value that can be explained by a function of covariates and spatial random effects. Gaussian distributions are also commonly used for modelling continuous response variables such as standardised incidence ratios (SIRs) on a logarithmic scale [40] and binomial distributions are used for proportions [41]. There is a wide range of spatial prior formulations in the literature, including basis functions, deformation methods, Gaussian Markov Random Field (GMRF) methods etc. [42]. A very popular class of priors to represent these random effects is the conditional autoregressive (CAR) models, which are a special case of the GMRF methods. Different model formulations using different CAR models are available in the literature [15,43]. A short recap of BYM, Cressie, Leroux and generalised Moran basis priors are provided in this section.

The BYM model.
A very well-known Bayesian hierarchical model for disease mapping was proposed by Besag et al. [18], known as the BYM model. The spatial random effects ψ i (from Eq 4) comprise a spatial random effect term u i with a CAR prior structure and an unstructured random component v i . The conditional distribution of each u i can be expressed as: where w ij are the weights defining the relationship between area i and its neighbours and the prior mean is a weighted average of the other u j [18]. A popular specification of the weights is the intrinsic CAR prior in which w ij = 1, if area i and j are adjacent and w ij = 0 otherwise. The prior for unstructured random component v i is typically considered to have an independent normal distribution, v i � Nð0; s 2 v Þ: The above model specification with both structured and unstructured random components (u i and v i ) with or without covariates is also known as a convolution model.
If only a structured component u i is used to express the spatial dependence (e.g. in Eq 4 ψ i = u i ), then the model is termed an intrinsic model, which is the simplest possible CAR prior that does not estimate the strength of the spatial correlation between the random effects [15]. Intrinsic CAR distributions are commonly used to model the spatial dependency structure in Bayesian hierarchical models, such as the intrinsic Gaussian Markov Random Field (IGMRF) [44].

Cressie model. The Cressie
Model, also referred to as a proper CAR model was proposed by Ver Hoef and Cressie [45] and Stern and Cressie [19]. The model considers a single set of random effects with an additional spatial correlation parameter ρ. The plausible range for ρ, 0 � ρ < 1 makes the Cressie model a proper CAR model and becomes the ICAR model for ρ = 1. According to this model, the random effect ψ has a multivariate normal distribution [16]: where the (i, j)th element of Q is defined as: The univariate full conditional distribution for ψ i |ψ −i can be written as: where ψ −i denotes the random effect vector with the i th component deleted. The conditional variance is the same as for the intrinsic CAR model and the conditional expectation is expressed as a weighted average of local random effects with weight ρ and zero overall average weighted by 1 − ρ. A drawback of this model pointed out by Rampaso et al. [16] is that the conditional variance depends on the number of neighbours even in the absence of spatial dependence.

Leroux model. The
Leroux spatial model, proposed by Leroux et al. [20] proposed the following distribution for the spatial random effects ψ i (from Eq 4) with a singular covariance matrix D. Leroux et al. [20] proposed the generalised inverse of D as: where R is the intrinsic autoregression matrix which represents the neighbourhood structure of the regions with typical element, where n i is the numbers of neighbours of region i and I(i � j) is an indicator function taking value 1 when i and j are adjacent. The term ρ is introduced as a spatial dependence parameter, ρ 2 [0, 1], with the two extreme cases giving rise to the independence model (i.e., ψ i = v i and D = σ 2 I, where I is an identity matrix) and intrinsic autoregression (i.e., ψ i = u i and D = σ 2 R − ). The conditional moments can be expressed as weighted averages of local moments [46]: where ψ −i denotes the random effect vector with the ith element deleted. For ρ close to 1, the conditional variance becomes close to σ 2 /n i and for ρ close to 0, the variance becomes close to σ 2 , that is independent of the number of neighbours n i [16]. The Leroux Model [46] is more flexible than the earlier models, namely, the BYM model [18] and the Cressie model [45]. The Leroux model overcomes the shortcoming of the Cressie model which depended on the number of neighbours even when there was no spatial dependence by inclusion of the spatial dependence parameter ρ which may take the value 0, thus making the variance term independent of the neighbourhood structure (σ 2 ). The Leroux model also captures the intrinsic autoregression specified by ICAR BYM Model when ρ = 1. The flexibility of the Leroux model has been one of the reasons of popularity of this model in spatial data analysis. This has also motivated the formulation of a BEL model using the Leroux prior structure in the present study, which can take the form of the BYM model with intrinsic autoregression (ρ = 1) and spatially independent model (ρ = 0).
The univariate full conditional distribution for ψ i |ψ −i can be written as:

Generalised Moran basis priors.
The generalised Moran basis priors are shown in section 2.2 in reference to the BEL spatial models. Hughes and Haran [14] introduced the generalised Moran basis priors within a parametric framework. The use of generalised Moran basis priors for spatial random effects is intended for dimension reduction in large spatial dataset while conducting spatial smoothing. The spatial generalised linear mixed effect models (SGLMM) [18] was extended by applying generalised Moran basis priors and was named Sparse SGLMM by Hughes and Haran [14]. The authors provided different applications using binary, count and continuous data following normal distributions to perform reasonably well in order to estimate the fixed effects more precisely with reduced dimension and computational time.

Formulation of BEL spatial model applying CAR prior structure
The approach of the present study is to extend the BEL semi-parametric models proposed by Chaudhuri and Ghosh [10] and Porter et al. [11] by applying the popular CAR prior structures (Leroux and BYM) for spatial random effects. The proposed Spatial Bayesian Empirical Likelihood Model with a CAR prior, SBEL-CAR takes two forms: a model with a BYM prior (intrinsic CAR prior) for the spatial random effects (SBEL-BYM) and a model with a Leroux CAR prior for these effects (SBEL-Leroux).
The SBEL-Leroux model is formulated below noting the special cases (ρ = 1, 0) give rise respectively to SBEL-BYM model with intrinsic autoregression and independent BEL model with an independent Gaussian prior for spatial random effects (SBEL-IG) proposed by Chaudhuri and Ghosh [10].
The SBEL-Leroux model utilises the parametric prior of the Leroux model for the spatial random effects, a Gaussian prior for the covariate fixed effects and EL to estimate the parameters of the SAE model (Eqs 3 and 4) without specifying the data distribution of Y i . Thus, this model formulation is different from those of Porter et al. [11] and Chaudhuri and Ghosh [10] in terms of prior specification for the spatial random effects. This also required a new MCMC algorithm to obtain the posterior distributions of interest.
The empirical likelihood (EL) estimating equations utilised for estimating (β, ψ) are in Eqs (5) and (6) The constrained optimisation of the EL estimating equations is well established [26]. Some R packages e.g., gmm [47] and emplik [48] perform the computation as well. A random walk MH sampling algorithm is proposed to fit the proposed model following the suggestions of Porter et al. [11]. These details are described below.

Prior distributions of SBEL-CAR models
The prior distribution of the random effect ψ is taken to be a Leroux prior given by Eq (14). Defining a precision parameter associated with the variance of the random effect as t ¼ 1 s 2 , the distribution of random effects ψ can be written as, where D is a singular covariance matrix with generalised inverse given by Eq (10). For a specific value of ρ and an intrinsic autoregression matrix R, the prior density of ψ can be specified as: The prior distribution of the fixed effects β can be specified following the suggestion of Chaudhuri and Ghosh [10] and Porter et al. [11] as, where g represents the Zellner prior [39], τ is the precision parameter, p is the number of covariates in the study and I p is an identity matrix of dimension p × p. We specifyβ ¼ β WLS , the weighted least squares estimate of β following the suggestion of Porter et al. [11].
There have been many recommendations about choice of g including choosing fixed scalar or introducing prior for g as well [49][50][51][52][53]. Geinitz [49] outlined three different choices of g and the intuitive interpretations of the weighting in resulting posterior from a computational perspective. The author mentioned if g is chosen to be 1, it implies 50% prior weight and choice of g as 10 implies 10% prior weights, whereas, increase of g towards infinity leads to a diffuse prior. So, g is fixed at 10 in the present study following the suggestion of Chaudhuri and Ghosh [10].
Then the prior density of β can be written as: The prior distribution of precision parameter τ can be taken as a Gamma distribution [11], i.e., The prior density of τ can be written as,

MCMC sampling algorithm
The MCMC sampling algorithm was designed and implemented in R and is motivated by the one provided by Porter et al. [11]. Hence there are similarities in the algorithm steps with differences in some models and equations induced by the new prior specification for the spatial random effects ψ i .

Obtaining starting values
Using the gmm package [47] in R and the EL estimating Eqs (5) and (6), the maximum empirical likelihood estimates (MELE) for β are obtained. The initial values for β are chosen randomly from the prior distribution, weights w i are set to 1/n and s 2 i is replaced using the calculated residual variance for estimation purposes. Using the MELE of β and setting ψ ¼0 gives the starting values of m i ¼ x 0 i β þ ψ. The EL weights w i are calculated to satisfy the constraint: where m j (y i , μ) are the estimating equations (j = 1, 2) presented in (5) and (6). This calculation can be made by constrained optimisation using el.test from the emplik [48] package in R.

Sampling spatial random effects, ψ
To sample ψ, a multivariate normal proposal ψ � � MVN(0, S) is used where the proposal covariance S is tuned by pilot chains [54]. The proposed values are then utilised in the estimating equations below to generate weights w � i : To generate the set of weights, the MELE estimate of β from step 1 is used. If w � i satisfies the constraint specified in Eq (20), a Metropolis-Hastings (MH) step is performed with the following posterior density ratio: where t = 1, 2, 3, . . . is the iteration index and ψ (t−1) � is the value of ψ � in the previous iteration. When t = 1, in the first iteration, ψ ðtÀ 1Þ� ¼ ψ 0� ¼0. To compute the ratio γ ψ , D − must be estimated given ρ. Ideally, in a parametric set up, ρ is simultaneously estimated in the MCMC algorithm using an appropriate prior distribution and proposal distribution for ρ. A similar approach can be adapted in the semi-parametric approach as well. However, the number of parameters estimated using the SBEL approach does not leave enough information for effective sampling of ρ. The parameters ψ and ρ are strongly interdependent, which results in a very difficult posterior space for the Random Walk Metropolis-Hastings to adequately explore and obtain independent Monte Carlo draws from. In our investigations of MCMC sampling with free ψ and ρ, we found that increases the MH kernel variance did not result in more independent samples as even a small increase resulted in a very low acceptance rate for new parameter values. It is for this reason we decided to fix one parameter and sample the other. It is to be noted that in sampling spatial random effects, use of Gibbs sampling is predominant in the literature [55,56]. In SBEL approach, we cannot use Gibbs sampling due to lack of likelihood for the response variable. Hence, we decided to fix one parameter, ρ and sample ψ, the spatial random effects. A grid search approach is recommended to find an appropriate ρ using the SBEL-CAR model structure. The posterior samples under this model can be drawn for each of the proposed values of ρ and identify the most suitable ρ for a given dataset can be determined using a model performance criterion, such as WAIC [57,58]. The fixed value of ρ is then used in the algorithm to sample the spatial random effects. Please see Appendix 2 of S1 File for illustration.

Sampling the fixed effects, β
The vector β is sampled using a MH random walk step with a multivariate normal proposal, β � � MVNð � β; Σ β Þ with proposal covariance tuned on the basis of pilot chains [54]. If the generated weights w � i utilising the proposed values β � verify the constraints specified in (20), a MH step is performed having posterior density ratio as: where β � ðtÀ 1Þ and w � iðtÀ 1Þ are the values of β � and w � i in the (t − 1)th iteration respectively. For t = 1, β � (0) and w � ið0Þ are replaced by the initial estimates of β and weights w i are generated using the initial β. As described above, g is the Zellner prior [39]. and � β is the weighted least squares estimate of β [11].
The proposed values are accepted if γ β > u β where u β � Unif(0, 1). If Eq (20) is not satisfied, β (t) is set equal to β (t−1) . Notice that this step is similar to that proposed by [11]. The difference lies in using the values of ψ � from step 2, which was estimated considering the Leroux model structure.

Sampling τ
Following the suggestion of Porter et al. [11], τ is sampled with a Gaussian proposal as t � 0 � Nðt; S t Þ, with a proposal variance tuned based on pilot chains and accepted according to a MH step with posterior density ratio as: where, In the numerator of the posterior density ratio in Eq (25), the current values of β, ψ are utilised with the proposed values of τ, τ � . In the denominator, all the values of the previous iteration are used, and τ � is accepted if γ τ > u τ , u τ � Unif(0, 1).
As per step 3, this step is also similar to that given by Porter et al. [11] with changes in the posterior density ratio due to the estimation of ψ � in step 2.

5.
Steps 2-4 are repeated until convergence. After convergence, the samples drawn for each of the parameters of interest ψ, β and τ are stored as draws from the desired posterior distribution and used to obtain inferences of interest.

Implementation
The SBEL-CAR models (SBEL-BYM and SBEL-Leroux) were fitted in R using the MH algorithm (section 3.2). An R package called BELSpatial is made available in github (https:// github.com/Farzana-Jahan/BELSpatial) that contains all the necessary code to draw posterior samples of interest from the proposed SBEL-CAR models using any areal data set. The package also contains code to draw posterior samples from the BSHEL model [11] and independent Gaussian model [10]. To implement the algorithm, the R package emplik [48] was used to calculate the EL weights corresponding to the estimating equations and the package gmm [47] was used to calculate the maximum EL estimates of the regression coefficients from the data in order to obtain the starting values of the regression coefficients in the algorithm. The initial values for the other parameters of interest were randomly generated from the respective prior distributions. Three parallel chains of 1 million iterations with a burn in period of 100,000 iterations, thinned by 10, were run to fit the models in this present study described below. The convergence of the MCMC chains was assessed by using visual diagnostics and the Gelman-Rubin diagnostic. Since no distribution is assumed for the underlying data, a larger number of iterations is needed to obtain convergence compared to an traditional Bayesian parametric spatial models. In addition to fitting the proposed SBEL-CAR models, the SBEL-IG model using independent Gaussian priors proposed by Chaudhuri and Ghosh [10] and the BSHEL model proposed by [11] were also implemented to compare the performances of the BEL spatial models following the algorithms provided by the corresponding authors. We acknowledge the existence of a Hamiltonian Monte Carlo (HMC) algorithm [38] and an R package named elhmc [59] for fitting the SBEL-IG model, but in the present study we have employed a MH algorithm instead to estimate the posteriors for all four spatial BEL models.
To compare the performance of the SBEL models with their parametric analogues, the IG model (using an independent Gaussian prior for spatial random effects), the BYM and the Leroux models were fitted using the CARBayes package [60] in R. The generalised Moran basis prior applied by the sparse SGLMM model was fitted using the R package ngspatial [61]. For comparison purposes, the number of iterations, burn in period and thinning interval were kept the same for the parametric and spatial BEL models.

Comparison of model performance
The posterior summaries of the parameters of interest (fixed effects and precision parameters) using the parametric and SBEL models, along with Gelman Rubin diagnostic value for convergence and WAIC [57] were used as measures of performance of the models. The WAIC, which has been used in spatial models by many authors, e.g., Aswi et al. [62] and Duncan and Mengersen [63]; a smaller value of WAIC indicates a better model fit [57]. There have been two adjustments of WAIC suggested in the literature and both are viewed as the approximations to cross validation [58]. The second adjustment to WAIC, proposed by Gelman et al. [58] gives more stable version by computing variances for each data points and then summing. In this present study, the stable version of WAIC [58] is used.
It is to be acknowledged that there are other available model selection criteria such as Kfold cross validation, information criteria such as DIC, leave one out cross validation among others [64]. All the model selection criteria described in [64] are for non-spatial Bayesian models. These criteria are not suitable for validating a spatial model [63]. The reason of not considering any of these cross-validation techniques for model selection is that the spatial component of the model adds an additional objective that often competes with goodness-offit. That is, goodness-of-smoothing and goodness-of-fit criteria will often lead to different "best" models. Additionally, in a spatial setting for small area level data, leaving out one or more small areas from the model as in cross validation would completely change the neighbourhood patterns, and underlying spatial dependence. Thus the goodness of fit and model selection for spatial modelling needs different considerations [63]. So we followed the recommendations of Duncan and Mengersen [63] and chose WAIC as the model selection criteria, as it is showed valid in the context of spatial models as well. While this criteria is not perfect for spatial models, more research needs to be conducted to find out ways to apply cross validation or other selection criteria for Bayesian spatial models in small area level, which is beyond the scope of this current study.

Applications
This section presents a thorough investigation of the SBEL-CAR models. The models are applied to two well-known areal spatial datasets, namely the Scottish lip cancer data and the North Carolina Sudden Infant death syndrome (SIDS) data. The models are compared with existing parametric Bayesian spatial models and other existing SBEL models (for details, see section 3.4). The comparison with the Dirichlet process (DP) prior under BEL framework proposed by Chaudhuri and Ghosh [10] are not made in this article, considering the similar output for IG and DP models reported by these authors. A simulation study is also made to compare the performances of the parametric and semi-parametric spatial models. At last an application of proposed and existent SBEL models along with the parametric spatial models are made to very recent COVID-19 data for Europe in 2020 at the small area level.

The Scottish lip cancer data
The Scottish lip cancer data is a publicly available spatial dataset compiled by Kemp et al. [22] and Breslow and Clayton [65]. The Scottish lip cancer data has been analysed by many authors including Clayton and Kaldor [66] Leroux et al. [46] etc. It contains data on lip cancer incidence in males registered during the 6 years from 1975 to 1980 for 56 small areas (counties) of Scotland along with information on expected incidence and sun exposure (spatial covariate). In this project, we use this well-known data set to illustrate the proposed SBEL-CAR models.
To fit the proposed SBEL-CAR approaches with the FH model described in Eqs (3) and (4), let Y i be the log of the observed standardised incidence ratios, logSIR calculated from the data by taking the ratio of observed and expected incidences in each of the 56 counties and let μ i be the corresponding expected logSIR for each county. Here, X is a 56 by 2 design matrix, the first column of which corresponds to an intercept parameter and the second column of which corresponds to the measure of sun exposure in each county. The estimating Eqs (5) and (6) are used in this context to draw posterior samples from the SBEL-BYM and SBEL-Leroux models using the MH algorithm for MCMC estimation described in section 3.2. The independent Gaussian spatial model under the BEL framework (SBEL-IG) [10] and the BSHEL model [11] are also fitted in the same context in order to compare the performances. Under the parametric setup, the following model was considered with the CAR priors (ICAR BYM and Leroux), Moran basis Priors (via Sparse SGLM) and independent Gaussian (IG) priors are employed to model the spatial random effects ψ: The parametric model is chosen to make it compatible with the estimating equations utilised to fit the spatial BEL models. Alternative choices of models are also possible for both the parametric and semi-parametric set ups, such as a Poisson model for the observed incidence utilising expected incidence as an offset variable. The estimating semi-parametric models need to be adjusted to make these comparable.
The comparative performance of the models is summarised in Table 1. It is observed that all the SBEL models provided similar estimates of posterior means for fixed effects. However, in the SBEL models, the 95% credible intervals were wider than those of the parametric models.
Hence the results obtained are encouraging as they show that the estimation for the parameters β made from SBEL-CAR models are similar to those obtained by using parametric models. It is noted that the Scottish lip cancer example is a benchmark data set for which parametric assumptions hold, so it is not surprising to see parametric models performing better according to WAIC and posterior distributions. In cases in which it is not straightforward to assume the parametric distribution, SBEL models can be useful to draw posterior inference and make predictions. Figs 1 and 2 show posterior densities obtained by applying the SBEL and parametric spatial models using different prior structure such as: BYM, Leroux, Moran basis prior and IG prior. It is noted that the BSHEL model outperforms the SBEL-BYM and SBEL-Leroux models based on the precision of the posterior estimates of the regression parameters, though the posterior means are all concentrated around the same value for this case study. According to the WAIC reported in Table 1, the prediction performance of the SBEL models is very similar irrespective of the choice of spatial priors. For spatial maps showing smoothed SIRs obtained by SBEL models and visualisation of posterior distribution from each of the parametric and SBEL models, see Figs A1 and A2 in S1 File.

The North Carolina SIDS data
The North Carolina sudden infant death syndrome (SIDS) data is also a publicly available benchmark dataset containing the annual number of SIDS and death rates per 1000 live births for each of 100 counties and each of the years between 1974-1975. The data set was first presented by Atkinson [23] and subsequent analysis and additions were made by Symons et al. [67] and Cressie [68]. The augmented version of the data is printed in Cressie, N. [69]. A more recent introduction to this data set is given by Bivand [70]. For evaluating the performance of the proposed SBEL models in this project, the aggregated counts of SIDS for 1974-78 are modelled using the corresponding spatial covariate (number of non-white births) at small area levels.
Similar to the previous case study, the proposed SBEL-CAR models are fitted considering Y i as the log of the observed standardised mortality ratios, logSMR calculated from the data by taking the ratio of observed and expected mortality in each of the 100 counties. Here μ i is the expected logSMR for each county and X is a 100 by 2 design matrix. The available covariate information in the data set is the number of non-white births in each county, denoted by the second column in X. Other spatial BEL models and parametric spatial models are also fitted to compare the model performance similar to the Scottish Lip Cancer application.
The posterior estimates of regression parameters and the prediction performance of each models fitted to the data are presented in Table 2. Among the parametric models, the Leroux model shows the best performance and among the SBEL models, the BSHEL model performs the best, although the WAIC values of all the SBEL models are very close. Figs 3 and 4 show the posterior densities obtained using SBEL models and parametric models. The spatial maps showing raw and smoothed SMRs obtained by fitting BEL spatial models are shown in the Fig A4 in S1 File. All these results for this case study also suggest that the choice of spatial priors is very important when fitting Bayesian parametric spatial models using BYM, Leroux, generalised Moran basis priors or IG for spatial random effects but the performance of SBEL models is very similar irrespective of choices of spatial prior.

Simulated data
For a more detailed investigation of SBEL models on areal spatial data, data were generated on expected counts, based on an underlying spatial random field (USRF) following the method provided by Aswi et al. [62]. A covariate was also generated and the observed counts simulated so that they are influenced by the covariate effect as well as the USRF. The synthetic data were generated for a small (25 areas) and a large number of areas (100) for strong and weak spatial autocorrelation among the small areas. Each of the parametric Bayesian spatial models (BYM, Leroux, IG and Moran basis) and SBEL models (SBEL-BYM, SBEL-Leroux, BSHEL and SBE-L-IG) were fitted to five realisations of each of the simulated data scenario taking the log of standardised incidence ratios, log(SIR) (ratios of observed and expected disease counts) for each area as the response variable. The model performance was compared by considering the maximum values of WAIC for each simulation scenario.
An additional scenario of disease data was considered, under which the log(SIR) followed a mixture distribution consisting of different Gaussian components with outliers.
The comparative performance of the models under the different simulation scenarios (Table 3) is presented in Table 4. The parametric models fitted assumes a Gaussian distribution of the response variable (Eq 29). However, the data generated for a small number of areas (25), does not necessarily reflect a Gaussian distribution (Fig 5). As a result, the SBEL models performed better than the parametric spatial models (smaller WAIC, Table 4). When the number of areas increased (to 100), the deviation from the normality assumption was reduced ( Fig  5) and as a result the parametric spatial models performed better than the SBEL models. Similar results are observed for both strong and weak spatial autocorrelation. The results obtained for scenario 3, where the data contain outliers and mixture distributions (Fig 6) indicate that the SBEL models perform better than the parametric models for both small and large number of areas (smaller WAIC, Table 3).
From the simulation, it can be observed that if the underlying distribution of the response variable is irregular, contains noise or does not adhere to the parametric assumptions for the underlying distribution, applying SBEL models may be preferable in terms of model fit and precision of parameter estimates. This might arise in the case of small number of areas so that the parametric assumption is not satisfied or even in situations where the underlying areal spatial data comprise mixture of distributions or include extreme observations. Porter et al. [11] also shows the superior performance of BSHEL model in the presence of outliers.

COVID-19 data
The SBEL models and parametric spatial models were fitted to the number of deaths due to COVID-19 in the countries in Europe during three periods of 2020 (Jan-April, May-Aug and Sep-Dec). The COVID-19 data has been compiled and updated by the John Hopkins University's (JHU) Coronavirus Resource Center (https://coronavirus.jhu.edu/map.html). All the information on confirmed COVID-19 cases and deaths from the JHU data source have been combined with other information from a range of data sources such as population, number of hospital beds available and so on in "Our World in Data" website https://covid. ourworldindata.org/. For the application of the models, the number of deaths recorded each day in each country of Europe was aggregated for each of the three periods of the year 2020 (Fig 7). The response variable is considered as the number of new deaths per million and a covariate was taken as the proportion of the population aged 65 and over (source: World Bank, compiled by "Our World in Data" website).
The variation in observed deaths per million in Europe is visible in the three time periods of 2020. To account for the temporal correlation, one option is to add a temporal component using a linear time trend, along with a spatial-temporal interaction term [71]. In the present article, we are concentrating on spatial models only and ignoring the temporal component. Adding temporal component and spatial temporal interaction might improve the model performance, which can be checked as a future extension of this study. The Fig 7 also exhibits some groups of countries having consistently lower numbers of deaths over the 3 periods, some of the countries show increase or decrease over time. More investigation is possible using the dataset in order to determine clusters and patterns of observed deaths over time, which is beyond the scope and focus of this current study.
The response variable, number of new deaths per million was log transformed in order to satisfy the required parametric assumptions (Eq 29). For the sake of comparison, Y i = log(new deaths per million) is used as the response in the SBEL models as well, noting that this is not necessary and SBEL models can work without the log transformation to the response variable.
The model performance statistic, WAIC, is presented in Table 5 for each model in each time period. From this table, it is evident that SBEL models outperformed the parametric    The posterior distributions of the regression parameters and the precision parameters for period 1 using different models are shown using box plots in Fig 9. It can be observed that the posterior means for the regression parameters using all the parametric and SBEL models are very similar with different variability. Among all the models, BSHEL has the highest number of outliers in the posterior distribution of the regression parameters. This also resulted in the lowest mean with wider credible interval for the precision parameter (τ) for the BSHEL model (Fig 9). Porter et al. [11] used three different case studies and the precision parameter behaved differently for different datasets. Hence it can be asserted that the BSHEL model is not a good choice for modelling COVID-19 death data for 2020 in Europe.
The numerical values of the posterior summaries using each model for periods 1, 2 and 3 are shown in Table A1 in Appendix 1 of S1 File. The spatially smoothed values for new deaths per million in Europe for the period 3(Sep-Dec) are shown in Fig A5 in Appendix 1 of S1 File.

Discussion
The present study investigates a Bayesian empirical likelihood (BEL) framework for modelling spatial data at small area level. The article developed spatial BEL (SBEL) models employing the popular CAR structure priors for the spatial random effects to control for the underlying spatial autocorrelation. A MCMC algorithm was proposed for SBEL-CAR models using MH techniques. A detailed comparison of the existing SBEL models (SBEL-IG and BSHEL) with the proposed SBEL-CAR models was made using two benchmark case studies employing real life spatial data at small area level, a simulation study and a study of COVID-19 deaths in Europe. A comparison with the parametric spatial models analogous to the semi-parametric models was also made in this study.
The comparison of the model performances shows that parametric spatial models are a better choice for situations in which the underlying parametric assumptions are satisfied as reflected by the model performance in the case studies using Scottish Lip cancer data and North Carolina SIDS data. However, the results of the case studies suggested that the estimated regression coefficients of spatial BEL models were similar to those estimated by parametric spatial models with wider credible intervals, which is expected as the BEL models were using the empirical likelihood of the data to obtain the posterior without imposing any distributional assumptions on the data. The simulation study revealed that spatial BEL models can outperform their corresponding parametric spatial models if the underlying assumptions regarding the data distributions are not fully satisfied. This may occur due to smaller number of areas or the presence of irregularities in the form of outliers or a mixture of different distributional components in the data. BEL has already gained popularity for providing robust estimates in the case of model misspecification [5]. Such situations might occur in the case of spatial data at a small area level, in which case SBEL models can be chosen over parametric spatial models. SBEL models have already been demonstrated to perform better than a parametric spatial model in the presence of outliers [11].
The application of the SBEL and parametric models to COVID-19 data revealed that the SBEL models performed better than the parametric models, since the distributional assumptions of the response variable were violated. This supports the finding obtained from the simulated data. This result shows that spatial BEL models could be used to analyse the very recent COVID-19 data at small area level to reveal significant trends and relationships with improved prediction performance. This is an important area of future contributions using the already existent and extended SBEL models. This is to note that, the analysis performed with COVID-19 deaths data was not directly related to any biological insights, but it provided methods that can enable us to validate few assumptions using the data and model outcomes. For example, from the very beginning of pandemic we are listening to the facts that COVID-19 is more severe in elderly people and immunocompromised people. In the analysis conducted in this article modelled COVID-19 deaths using the covariate proportion of population aged 65 and over in each of the 54 countries in Europe during 2020. The model revealed significant positive coefficient of the regression which in a way validated the assertion of more COVID 19 deaths on the average to the populations with higher proportion of elderly people. The proposed model was able to model the data irrespective of the irregularities and non-normality of the response variable and enabled to validate assertion with support of data. This analysis conducted is an indication of how spatial BEL models applied to COVID 19 deaths or similar data with irregularities can validate multiple research questions.
SBEL models using different priors were seen to perform similarly for benchmark datasets, while notable change in model performance is noted for Bayesian parametric spatial models with respect to the choice of spatial priors [63]. The SBEL models also showed substantive differences in the WAIC values for the COVID-19 dataset. More detailed inspection revealed that for different data sets, different SBEL models obtained the lowest WAIC. The SBEL-IG model [10] performed best for Scottish Lip cancer data analysis, while the BSHEL model [11] performed the best among the other SBEL models for the North Carolina SIDS data. For COVID-19 application, the SBEL-BYM, which is a model proposed in this study using an ICAR prior for spatial random effects within a Bayesian hierarchical modelling framework outperformed all the parametric models and the already existent BSHEL and SBEL-IG models. It is known that the performance of parametric spatial models depends on the application and data set [15], and similarly the choice of SBEL model for modelling a small area level spatial data may be decided after comparing the performance of each of the spatial BEL models for each application.
The present study has contributed to the Bayesian Empirical Likelihood literature by developing two new Bayesian semi-parametric spatial models using two popular CAR prior structures. The proposed BEL-CAR models can be extended to apply other choices of CAR priors, such as the Cressie prior [68] and the Lu prior [21] in a relatively straightforward manner. The study showed superior performance of SBEL models in situations in which parametric distributional assumptions do not hold for the data.
One limitation of the proposed SBEL models is the computational cost of the models. It requires a relatively larger number of iterations for the convergence of the MCMC algorithm and thinning is required to avoid autocorrelation. The simulation studies conducted in this paper could be extended by including more scenarios to investigate the choices of spatial models (parametric vs semi-parametric and choice of spatial priors under BEL framework) to provide more detailed insights into the choice of appropriate models to analyse areal spatial data.